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mRNA editing of sequences of many species is analyzed. The nature of the inserted 
nucleotides and the position of the insertion sites, once fixed the edited peptide chain, 
are explained by introducing a minimum principle in the framework of the crystal basis 
model of the genetic code introduced by the authors. 
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1 Introduction 

One of the basic dogmas of molecular genetics states that the information contained in DNA 
flows faithfully, via the mRNA intermediate molecule, into the production of proteins. In 1986 
|I[, it has been discovered in trypanosoma mitochondria that the information contained in DNA 
is not always found unmodified in the RNA products. In the following fifteen years, it has been 
demonstrated that in several organisms (kinetoplastid protozoa, mitochondria or chloroplasts 
of plants, mammalian cells), some yet unknown biochemical machinery alters the sequence of 
the final transcription products. This process is called RNA editing. For an extensive list of 
articles on RNA editing, the reader can look at the many web sites on RNA editing @, || . 

The alteration of the sequence of nucleotides in the RNA occurs after it has been transcribed 
from DNA but before it is translated into protein. Post-transcriptional modifications have also 
been observed and interpreted as RNA editing. RNA editing occurs by two distinct mechanisms: 
1) substitution editing: chemical alteration of individual nucleotides (the equivalent of point 
mutations), usually C — > U. These alterations are catalyzed by proteins that recognize a specific 
target sequence of nucleotides (much like restriction enzymes). 2) insertion/deletion editing: 
insertion or deletion of nucleotides in the RNA (usually U or C) . It is generally believed that 
these alterations are mediated by guide RNA molecules (gRNA) that base-pair as best they can 
with the RNA to be edited and serve as a template for the addition (or removal) of nucleotides 
in the target [f|] . However there is no evidence for the presence of the gRNA for all concerned 
biological species. 

The main features of mRNA editing are: 

- the insertion (generally multiple) of U nucleotides or of a single C nucleotide. 

- the large majority of the transition involves C — > U. A few cases of transitions U — ► C have 
also been reported. 

- mRNA editing modifies a few percent (0.8 to 5.8 %) of the nucleotides of a specific transcript. 

- the mRNA editing appears as a random event, but most of the edited nucleotides occurs at 
certain hotspots. 

As a consequence of the RNA editing, there is a change in the final biosynthesis of amino 
acids, the most frequent changes being Pro — ► Ser, Ser — ► Leu, Ser — > Phe. The deep mechanism 
which causes RNA editing is still unknown. The understanding of the event is complicated: 
from a thermodynamics point of view a change, i.e. C — > U, takes place if it is favored in 
the change of entalpy or entropy, but should this be the case, the change should appear in all 
the organisms. Moreover from a microscopic (quantum mechanical) point of view, the change 
should occur in both directions, i.e. C <-> U. It seems that the primary aim of mRNA editing 
is the evolution and conservation of protein structures, creating a meaningful coding sequence 
specific for a particular amino acid sequence. 

The purpose of this paper is to propose an effective model to describe the RNA editing. 
Our model does not explain why, where and in which organisms editing happens, but it gives 
a framework to understand some specific features of the phenomenon. The paper is organized 



as follows. In section 2, we analyze the mRNA editing in Physarum polycephalum. We first 
consider this biological species for two reasons: the high statistics of the available data, and 
the feature of this editing which is mainly characterized by single C insertions, allowing a 
more detailed and accurate analysis. We show that the existence of preferred sites as well 
as the nature of the insertions can be understood by requiring the minimization of a suitable 
function defined on the codon sequence. This function can be defined as we identify each codon 
by a set of four half-integer labels. In section 3, we then analyze the generally multiple U 
insertions occuring in kinetoplastid protozoa and we show that also in this case the mRNA 
editing is understood by a similar minimization procedure. In section 4, we discuss briefly the 
substitution editing. Finally, we give a few conclusions and highlights for future developments. 

2 Insertion editing by C 

The mRNA editing in Physarum polycephalum, discovered in 1991 by R. Mahendran, M. 
Spottswood and D. Miller ||, has been extensively studied and it presents the peculiar feature 
to be characterized mainly by C insertions. Main feature of the RNA editing in Physarum 
polycephalum is that in about 80 % of the cases the insertion occurs in the third position of 
the codon, the insertion sites are non random and in about 68 % of the cases the C is inserted 
after a purine-pyrimidine dinucleotide. Moreover no rule for the location of the editing sites 
has been determined, even if the presence of hotspots have been remarked. We have analyzed 
three published sequences of mRNA editing in portion of the ATP-9 Mitochondrial, of mRNA 
of cytochromes c and b of Physarum polycephalum 0, |6], [7| , showing respectively, 54 insertions 
of a single C, 62 insertions (59 single C, 1 single U) and 40 insertions (31 single C, 6 single 
U). As a whole we have analyzed 151 single insertions (144 C and 7 U) in three published 
sequences of Physarum polycephalum ||, [], 0, remarking that the same amino acid chain could 
have been obtained by insertion of C in a site different from the observed one or by insertion 
of a nucleotide different from C or U. 

In the whole of the analyzed sequences we have remarked (inserted C nucleotides are under- 
lined) : 

1. the presence of at least 22 alternative insertion sites for C (15 % of the cases, see Table §), 
which would produce the same final amino acids, so not altering the protein biosynthesis. 
For example, at the insertion site 9 of Ref . |5j , the (observed) sequence is ACC TTA (Thr 
Leu) , while the (unobserved) sequence with alternative insertion site may be ACT CTA. 

2. in at least 108 (resp. 98 and 63) of the 144 single C insertions (75 %, resp. 68 % and 
44 % of the cases, see Table |3[), the same final amino acid may have been obtained by a 
single U (resp. A and G) insertion. Note that in writing Table [| when the insertion site 
is ambiguous, i.e. when the inserted C is next to another C, sometimes a shift has been 
performed. 



Moreover, we have to consider the two cases GCC UCU — > GCU ACU - site 16' - and CUU 
AAA — ► UUA AAR - site 21* - where C insertion is replaced by an A or an R (R = A, G) 
insertion together with a shift of the insertion site. A similar analysis has been performed for 
the single U insertions. 

This implies two natural questions: 1) why the insertion sites are the observed ones and not 
the other ones ? 2) why the C insertion is largely preferred ? 

In physics when a phenomenon occurs in one fixed way between many possible choices, 
one assumes that some minimum principle has to be satisfied. The simplest example is the 
straight path of light (in absence of strong gravitation fields), corresponding to the shortest 
path between two point in euclidean geometry (the so-called geodesies). Can we think of 
the existence of a sort of minimum principle to explain mRNA editing and/or other process 
in DNA ? There are several technical and conceptual difficulties in this way of tackling the 
problem. One should give a mathematical modelisation of RNA and identify the sequence by 
a possibly discrete set of variables. Defining a topological metric space depending on discrete 
variables and introducing on it a variation principle is a hard mathematical problem. Moreover 
we do not have a priori any theoretical guidelines, such as the Hamiltonian and/or Lagrangian 
formalisms, so we must have some good empirical grounding to begin with. Of course we do 
not expect biological processes to be deterministic, as it is the case in classical mechanics; so 
we have to unite minimum principle, if any, with random nature of the events, like in quantum 
mechanics. In the present note, as a first step, we look for a simple function which would take 
the smallest value in the observed configuration of insertion sites and single C insertion, with 
respect to the configuration with insertion in alternative sites and/or with a single U, G, A 
insertion. 

The starting point for a mathematical modelisation of DNA or mRNA is the crystal basis 
model of the genetic code || where the nucleotides are assigned to the 4-dim irreducible funda- 
mental representation (1/2, 1/2) of U q -^(sl(2) ©s/(2)) and any sequence of N nucleotides to the 
iV-fold tensor product of (1/2, 1/2) (for codons, see || or Table 4 of 0, here reported in Table 
U for completeness). As a consequence of the model any nucleotide sequence is characterized 
as an element of a vector space. Therefore, functions can be defined on this space and can be 
computed on the sequence of codons. Maybe it is worthwhile to emphasize that for the aim 
of this paper, it is not necessary to undestand completely either the mathematical structure of 
the crystal basis, or the reason to deal with such a sophisticated mathematical structure (see 
e.g. HP). The essential point is that any codon is identified by a set of four half-integer labels 
and functions can be defined on the codons. We make the assumption that the location sites 
for the insertion of a nucleotide should minimize the following function for the mRNA or cDNA 



Aq = exp 



^4a c C* + 4(3 C C V + 2 lc Jl H 



where the sum in k is over all the codons in the edited sequence, C\ {C v ) and Jf#, are the 
values of the Casimir operator and of the third component of the generator of the H-sl(2) (V- 



sl(2)), see ||, in the irreducible representation to which the k-th codon belongs, see Table 1. 
Let us recall that the value of the Casimir operator on a state in an irreducible representation 
(IR) labelled by ( Jh, Jv) is 

C h (Jh,Jv) = Jh(Jh + 1) and C v (J H , J v ) = J V (J V + 1) (2) 

In (JI|) the simplified assumption that the dependence of Aq on the irreducible representation to 
which the codon belongs is given only by the values of the Casimir operators has been made. 
The parameters a c , (3 C , 7 C are constants, depending on the biological species. 

The minimum of Ao has to be computed in the whole set of configurations satisfying to the 
constraints: i) the starting point should be the mtDNA and ii) the final peptide chain should 
not be modified. It is obvious that the global minimization of expression ([!]) is ensured if Aq 
takes the smallest value locally, i.e. in the neighborhood of each insertion site. The form of the 
function Aq is rather arbitrary; one of the reasons of this choice is that the chosen expression 
is computationally quite easily tractable. If the parameters a c , /3 C , 7 C are strictly positive with 
7 c /6 > /3 C > a c , the minimization of ([!]) explains the observed configurations in all cases, except 
for the cases 12, 33, 45 and 41* where there is equality and the cases 18* and 51* where the 
minimization is not satisfied (see Table 0). 

In order to deal with the remaining cases and to take into account the observed fact that 
the dinucleotide preceding the insertion site is predominantly a purine-pyrimidine, we add to 
the exponent of the function Ao an " interaction term" which is equivalent to multiply (|lj) by 
the function A\ where 



A\ = exp 



EA,, ,•(* , , (*- 1 ) 4-4.,, ,-W „■(*- 



(3) 



The sum in i is over the insertion sites and j 3V is the value of the third component of the 
generator of V-sl{2) of the n-th nucleotide preceding the inserted nucleotide C (i.e. +1/2 for 
C, U and —1/2 for G, A) and u>i c , u>2 C are constants, depending on the biological species. In 
the case where the insertion site cannot be unambiguously determined, i.e. when the inserted 
nucleotide is next to a nucleotide of the same type, (§) should be computed in the configuration 
which minimizes the value of A\- If uj 1c > u>2 C > and uj\ c > 12a c the minimization of the 
function A = Ao Ai explains all the observed positions for C insertions, see Table [| It is 
reasonable, but not taken into account in (|l]), to argue that the insertion sites and the nature 
of the inserted nucleotides also depend on the content of the particular sequence. Moreover A 
might be considered as the first terms of a development, next terms involving representations 
corresponding to more than one codon, the nature of the nucleotides following the insertion 
site, etc. These further terms may play a role in a more refined analysis. 

An analysis of the 7 single U insertions shows that in 6 cases - sites 22*, 10', 18', 22', 24', 
26' - (resp. 3 cases - sites 10', 18', 24' -) the replacement U — > C (resp. U — >■ R) gives the 
same amino acid. In 4 of these cases the minimization of Eq. ([!]) should prefer the insertion of 



C, giving rise to UUU ->■ UUC, site 22*; CUU -> CUC, sites 10', 18'; ACU -> ACC, site 24', 
while in sites 22', 26' UUA is more preferred than CUA. This may explain why the U insertions 
are so rare compared with the C insertions. Also in this case further terms in A may help for 
a more refined analysis of the preferred configuration of the insertions. 



3 Insertion editing by U 

The mRNA editing with insertion of U has been observed in particular in a group of parasitic 
protozoa known as kinetoplastid protozoa. Contrary to the C insertion case where only single 
nucleotide insertions occurs, the main characteristics of the mRNA editing by U insertion is 
that the U nucleotides are inserted by blocks. In this way, almost all amino acids are can be 
obtained with a great proportion of Phe and Leu. Many sequences where mRNA editing with U 



insertion occur can be found in [lOfl and an extensive list of references on the U insertion editing 
can be found in ||. We limit ourselves to cite the first papers on the subject |T], [12|, [13], Q. 
The table below shows the species and the genes that have been used in our analysis. In 
this table, COX = cytochrome oxidase, Cyt b = cytochrome b, G = G-rich region, NADH 
= NADH dehydrogenase, RPS12 = ribosomal protein S12, MURF = maxicircle unidentified 
reading frame. The number of edited sites is quite large (more than 1000 sites). 



species 


ATPase6 


COX I 


COX II 


COX III 


Cytb 


G3 


G4 


Crithiadia fasciculata 


X 




X 


X 


X 






Leishmania tarentolae 


X 




X 


X 


X 


X 


X 


Phytomonas serpens 


X 










X 




Trypanosoma brucei 


X 






X 


X 




X 


Trypanosoma borreli 




X 






X 






Trypanosoma cruzi 


X 




X 










species 


MURF2 


NADH3 


NADH7 


NADH8 


NADH9 


RPS12 




Crithiadia fasciculata 


X 




X 






X 




Leishmania tarentolae 


X 


X 


X 


X 


X 


X 




Phytomonas serpens 








X 




X 




Trypanosoma brucei 


X 


X 


X 


X 


X 


X 




Trypanosoma borreli 












X 




Trypanosoma cruzi 

















Species and genes used in the U insertion mRNA editing analysis. 

Following the same analysis as in the previous section, we make the assumption that the 
location sites for the insertion of a U nuleotide should minimize the following function for the 
mRNA: 

A' 



cxp 



J2^ u C k H + A(3 U C^ + 2 lu Jl H 



(4) 



When choosing the parameters a u , j3 u , 7 U such that a u , 7„ < and (3 U > with 7„/6 < a u , the 
minimization of (f|) explains all the observed configuration, except in the cases CGU and GGU 
where the configurations CGA and CGU on the one hand and GGA and GGU on the other 



hand are equivalent (inserted U are underlined). Multiplying Eq. (H) by the corrective term 



A\ = exp 






(5) 



with t^iu < 0, the observed configurations become the preferred ones. 

It may happen that different U insertions lead to the same configuration of amino-acids 
(note however that in the U insertion case, this is much less frequent than in the C insertion 
case, since in the U insertion case, the U nucleotides are inserted by blocks). In the analyzed 
sequences, we have noticed six such possible alternative configurations: 

- in Leishmania tarentola, gene NADH8, at edited position 229, one observes the configuration 
C\ = GCU CUA, the alternative configuration is C 2 = GCC UUA, and one has Aq{C\) < 

A(C 2 ). 

- in Phytomonas serpens, gene NADH8, at edited position 355, one observes the configuration 
C\ = GCA AUU, the alternative configuration is C 2 = GCU AUA, and both configurations 
are equivalent: Aq(C\) = ^4. (C 2 )- 

- in Trypanosoma borreli, gene cytochrome c oxidase I, at edited position 1375, one observes 
the configuration C\ = GUA AUU, the alternative configuration is C 2 = GUU AUA, both 
configurations are equivalent: Aq{C\) = Aq{C 2 ). 

- in Trypanosoma brucei, gene cytochrome oxidase III, at edited position 645, one observes the 
configuration C\ = GCA UU G UU A UUU AUU, the alternative configuration is C 2 = GCU 
UUA UU G UUU AUA, both configurations are equivalent: Ao(Ci) = A (C 2 ). 

- in Trypanosoma brucei, gene NADH7, at edited position 988, one observes the configuration 
C\ = CCG GGU, the alternative configuration is C 2 = CCU GGG, and one has Aq{C\) > 
-4o(C 2 )- This is a counter-example, however in the configuration C 2 , the nucleotide U is inserted 
after a C, which is not favored. 

- in Trypanosoma brucei, gene NADH8, at edited position 251, one observes the configuration 
C\ = UGC CCU, the alternative configuration is C 2 = UGU CCC, and one has Aq{C\) > 
Ao(C2). This one is also a counter-example. 

In the above cases where the insertion sites are not unambiguously determined, multiplying 
Eq. (fD by the following corrective term 



A" = exp 



E, .(i) .(i-1) . , .(?) .(i- 

-4 Wl« Jly ■ f 3 y +AU 2u Jl^y ■ Jly 



(6) 



with uj\ u < and uj\ u + u 2u > 0, the observed configurations become the preferred ones. 

In conclusion, the observed U insertions minimize the function A' = AqA", except for two 
cases for which alternative insertion sites exist, where the function A' takes a lower value, at 
least in the simplified hypothesis that A" is a perturbative term to A' . It should however be 
noted that such perturbative term takes into account the nature of the neighbor nucleotides 
and the experimentally observed bias in the selection of the insertion sites shows an important 
effect of the neighbors. 



4 Substitution editing by C — ► U 

Substitution editing of mRNA by C —>■ U occurs for example in plant mitochondria and chloro- 
plasts and in the gene apoB in mammals (see web site ^ 4 in Ref. 0). For our study we 
have used the COXII gene of the wheat |15[ . Similar radical amino acid substitutions in plant 
COXII sequences have been inferred. Although the statistics is rather poor, we can extract 
interesting features. In the wheat COXII gene, one observes the following substitutions: CGG 
-> UGG (twice), CCU -> UCU, UCA -» UUA (twice), UCG -> UUG, CGU -» UGU, ACG 
— > AUG, so that the corresponding amino acids Trp, Leu, Leu, Cys, Met are correctly coded 
by the universal code. In [jl(|, the following substitution editing has been observed in several 
wheat genes (COXII, COXIII, Cob, NAD3, NAD4, RPS12): CGG -» UGG (seven times), CAC 
-> UAC, CAU -> UAU, UCA -> UUA, UCG -> UUG (twice), UCU -» UUU (three times), 
CUC — > UUC, CCG -> CUG, CCA — > CUA. In the case of the mammalian gene apoB, the 
editing depends on the location of the mRNA in the body of the species under consideration 
(editing in the intestine but no editing in the liver). It is characterized by CAA — ■> UAA (Gin 
— > Stop codon). 

As before, one can easily check that the function A' of Eq. (£|) minimizes the configuration 
corresponding to the substituted nucleotide with respect to the original one. 

In 0, three cases of substitution editing are reported in the col gene of Physarum poly- 
cephalum. Also in this the function A' is minimized. However, this function differs from the 
function A® Eq. [T| of Physarum. 

5 Conclusion 

We have shown that the nature of the inserted nucleotides and the position of the insertion 
site can be explained by introducing a minimum principle in the framework of the crystal basis 
model of the genetic code introduced in ref. ||. Indeed, we have made the assumption that, once 
fixed the final edited peptide chain, the nature and the position of the inserted nucleotide (s), 
are such to minimize the functions eqs. fll])-® or (f|), where the numerical real coefficients 
depend on the biological species, and the operators Cny an d J$,h have to be evaluated on the 
edited codons using Table |]. 

Our analysis shows that, in the case of Physarum polycephalum, in 110 of the 114 sites in 
which the insertion of C or U, and in all the cases where also an insertion of purine can produce 
the same amino acid, the observed mRNA editing makes use of the nucleotide C or U which 
does minimize A = AqAi- In the case of the U insertion in kinetoplastid protozoa genes, in 
all the cases but two, the function A' is minimized. This last function is also minimized in the 
case of C — > U substitution editing. 

The form of the function assumed to be minimized has been suggested by simplicity and 
easiness of computation. For these reasons we have only considered a dependence on the values 
of the Casimir operator Ch and Cy, although generally there is a degeneracy in the irreducible 

8 



representations. We have also made the hypothesis that the effects of neighboring nucleotides is 
weak and limited to the two foregoing ones. As we said previously, we are first of all investigating 
solid empirical grounds bearing the approach under consideration out, looking then for further 
mathematical refinements which may give also quantitative information. 

We have not considered insertion by nucleotides different from C and U since the statistics is 
very low. We have assumed that the constants a, fl, 7 depend on the biological species. However 
our analysis cannot exclude that indeed they depend only on the type of the inserted nucleotide. 
It would be interesting to analyze further data on mRNA editing in the analyzed as well as in 
other biological species to check that the minimum principle is satisfied. Further confirmation 
of the validity of our hypothesis would provide evidence in favor of the existence of strong 
physical chemical constraints in the domain generally believed dominated by casual events. 
The presence of a minimum principle which is indeed an indication of the possible application 
of variational principle in the field of complex biological systems would be an amazing result. 

In conclusion our effective model does not explain why and where mRNA editing occurs, but 
it seems to be able to determine the location sites and the nature of inserted nucleotides, once 
fixed the amino acid chain. 
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Table 1: The eukariotic code. The upper label denotes different irreducible representations. 



codon 


a. a. 


Jh 


Jv 


•h,H 


>/3,V 


codon 


a.a. 


Jh 


Jv 


•h.H 


</3,V 


CCC 


Pro 


3 

2 


3 

2 


3 

2 


3 

2 


UCC 


Ser 


3 

2 


3 

2 


1 

2 


3 

2 


ecu 


Pro 


(1 
\2 


1)1 
2> 


1 
2 


3 

2 


UCU 


Ser 


(2- 


3)1 

2> 


1 
2 


3 
2 


CCG 


Pro 


(3 

^2 


2> 


3 

2 


1 
2 


UCG 


Ser 


(1 


2^ 


1 

2 


1 
2 


CCA 


Pro 


(1 
V2 


2> 


1 
2 


1 
2 


UCA 


Ser 


(2- 


i) 1 

2> 


1 
2 


1 
2 


cue 


Leu 


(I 

\2 


3\2 
2> 


1 
2 


3 

2 


UUC 


Phc 


3 

2 


3 

2 


1 

2 


3 

2 


cuu 


Leu 


(I 
'2 


3\2 
2> 


1 
2 


3 

2 


UUU 


Phc 


3 

2 


3 
2 


3 

2 


3 

2 


cue 


Leu 


(1 
^2 


1)3 

2> 


1 
2 


1 
2 


UUG 


Leu 


^2 


2> 


1 
2 


1 
2 


CUA 


Leu 


(1 
V2 


1)3 

2> 


1 
2 


1 
2 


UUA 


Leu 


^2 


2^ 


3 
2 


1 
2 


CGC 


Arg 


(3 

^2 


1)2 

2> 


3 

2 


1 
2 


UGC 


Cys 


(1 


1)2 
2> 


1 

2 


1 

2 


ecu 


Arg 


(1 
^2 


1)2 
2> 


1 

2 


1 
2 


UCU 


Cys 


(2- 


1)2 
2^ 


1 

2 


1 

2 


CCG 


Arg 


(3 

V2 


1)2 
2> 


3 

2 


1 
2 


UGG 


Trp 


(1 


1)2 
2^ 


1 

2 


1 
2 


CGA 


Arg 


(1 
V2 


1)2 
2> 


1 

2 


1 
2 


UGA 


Tcr 


(2- 


1)2 
2> 


1 
2 


1 
2 


CAC 


His 


(I 

^2 


1)4 
2> 


1 
2 


1 
2 


UAC 


Tyr 


^2 


1)2 
2/ 


1 

2 


1 

2 


CAU 


His 


(1 

\2 


1)4 
2> 


1 

2 


1 
2 


UAU 


Tyr 


V2 


1)2 
2> 


3 

2 


1 
2 


CAG 


Gin 


(1 
\2 


1)4 

2> 


1 

2 


1 
2 


UAG 


Tcr 


\2 


1)2 
2/ 


1 
2 


1 
2 


CAA 


Gin 


(1 
\2 


1)4 
2> 


1 
2 


1 
2 


UAA 


Tcr 


\2 


1)2 
2/ 


3 
2 


1 
2 


GCC 


Ala 


3 

2 


3 

2 


3 

2 


1 
2 


ACC 


Thr 


3 

2 


3 
2 


1 

2 


1 

2 


GCU 


Ala 


(1 
^2 


3)1 

2> 


1 
2 


1 
2 


ACU 


Thr 


(2- 


3)1 

2^ 


1 
2 


1 
2 


GCG 


Ala 


(3 

^2 


2> 


3 

2 


1 
2 


ACG 


Thr 


(1 


i) 1 

2> 


1 

2 


1 
2 


GCA 


Ala 


(1 
\2 


2> 


1 

2 


1 
2 


ACA 


Thr 


(2- 


2^ 


1 
2 


1 
2 


GUC 


Val 


(1 

^2 


3\2 
2> 


1 
2 


1 
2 


AUC 


lie 


3 

2 


3 

2 


1 

2 


1 

2 


GUU 


Val 


(1 
\2 


3\2 
2> 


1 
2 


1 
2 


AUU 


lie 


3 

2 


3 
2 


3 

2 


1 
2 


GUG 


Val 


(1 
^2 


1)3 

2-> 


1 

2 


1 
2 


AUG 


Met 


V2 


2^ 


1 
2 


1 
2 


GUA 


Val 


(1 
\2 


1)3 

2^ 


1 
2 


1 
2 


AUA 


lie 


^2 


i) 1 
2> 


3 
2 


1 
2 


GGC 


Gly 


3 

2 


3 

2 


3 

2 


1 

2 


AGC 


Ser 


3 

2 


3 

2 


1 

2 


1 
2 


GGU 


Gly 


(1 
^2 


3)1 

2> 


1 

2 


1 
2 


AGU 


Ser 


(2- 


3)1 

2^ 


1 

2 


1 
2 


GGG 


Gly 


3 

2 


3 

2 


3 

2 


3 

2 


AGG 


Arg 


3 

2 


3 
2 


1 
2 


3 
2 


GGA 


Gly 


(1 
V2 


3)1 

2^ 


1 
2 


3 
2 


AGA 


Arg 


(1 


3)1 

2^ 


1 
2 


3 
2 


GAC 


Asp 


(I 

^2 


3\2 
2> 


1 
2 


1 

2 


AAC 


Asn 


3 

2 


3 
2 


1 

2 


1 
2 


GAU 


Asp 


(1 

^2 


3\2 
2> 


1 

2 


1 
2 


AAU 


Asn 


3 

2 


3 

2 


3 

2 


1 
2 


GAG 


Glu 


(1 
\2 


3\2 
2> 


1 
2 


3 

2 


AAG 


Lys 


3 

2 


3 

2 


1 
2 


3 
2 


GAA 


Glu 


(1 
V2 


3)2 
2> 


1 
2 


3 
2 


AAA 


Lys 


3 

2 


3 

2 


3 
2 


3 
2 



ff 



Table 2: From the left: the a.a., the C insertion site, the codons coding for the a.a., the 
dinucleotide preceding C; the shift with respect to the observed site of the alternative insertion 
site, the new codons, the dinucleotide preceding C in the alternative site. Ref. to fig. 3 of ||, 
fig. 2 of " 



with an asterisk *), fig. 2 of (with a prime '). 






a.a. 


site 


codons 


dinucl. 


shift 


codons 


dinucl. 


Thr, Leu 


9, 24, 55* 


ACC, UUA 


AC 


+ 1 


ACU, CUA 


CU 


He, Leu 


23, 30* 


AUC, UUG 


AU 


+1 


AUU, CUG 


UU 


Ala, Phe 


32 


GCC, UUU 


GC 


+3 


GCU, UUC 


UU 


Val, Phe 


33, 45, 41* 


GUC, UUU 


GU 


+3 


GUU, UUC 


UU 


Ser, Arg 


34 


UCC, AGA 


UC 


+1 


UCA, CCA 


CA 


Asn, Phe 


12 


AAU, UUC 


UU 


-3 


A AC, UUU 


AA 


He, Leu 


49, 48*, 20' 


AUC, UUA 


AU 


+1 


AUU, CUA 


UU 


Ala, Leu 


5' 


GCC, UUA 


GC 


+1 


GCU, CUA 


CU 


Ser, Phe 


43*, 13' 


UCC, UUU 


UC 


+3 


UCU, UUC 


UU 


Thr, Arg 


3* 


ACC, AGA 


AC 


+1 


ACA, CGA 


CA 


Ser, Leu 


18* 


AGU, CUG 


GU 


-1 


ACC, UUG 


AG 


Val, Leu 


23*, 40* 


GUC, UUA 


GU 


+1 


GUU, CUA 


UU 


His, Leu 


51* 


CAU, CUA 


AU 


-1 


CAC, UUA 


CA 



12 



Table 3: From the left: the a.a., the codon created by C insertion, the alternative codon 
created by alternative insertion, the site with reference to fig. 3 of ||, fig. 2 of || (with an 



,fig. 


2of0 


(with a prime '). Here X = U, A, G and R = A, G. 


a.a. 


codon 


alt. codon 


site 


Asn 


AAC 


AAU 


35, 4' 


Thr 


ACC 


ACX 


5, 7, 9, 10, 21, 24, 26, 36, 3*, 4*, 5*, 12*, 20*, 26* 
33*, 35*, 39*, 49*, 50*, 55*, 62*, 15', 39' 


Ser 


AGC 


AGU 


1*, 36*, 34' 


lie 


AUC 


AUU, AUA 


1, 4, 13, 15, 17, 18, 20, 23, 38, 46, 49, 50, 51, 6* 
7*, 9*, 16*, 17*, 19*, 24*, 27*, 30*, 34*, 38* 
48*, 54*, 57*, 58*, 60*, 61*, 20', 32', 36', 37' 


His 


CAC 


CAU 


44* 


Pro 


ccc 


ccx 


17' 


Arg 


CGA 


AGA 


30 


Leu 


CUA 


UUA 


31, 40, 8*, 51*, 6' 


Leu 


CUG 


UUG 


18* 


Leu 


cue 


CUX 


22 


Leu 


cuu 


UUR 


3, 13*, 21*, 47*, 8', 39' 


Asp 


GAC 


GAU 


54 


Ala 


GCC 


GCX 


25, 27, 29, 32, 37, 10*, 13*, 28*, 53*, 5', 16', 27', 30' 


Val 


GUC 


GUX 


2, 6, 11, 14, 33, 42, 45, 23*, 40*, 41*, 56*, 9', 21', 25' 


Tyr 


UAC 


UAU 


43 


Ser 


UCC 


ucx 


34, 42*, 2', 12', 13' 


Phe 


UUC 


uuu 


12, 52, 45* 
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